Robust Strong Lensing Time Delay Estimation 
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Strong gravitational lensing of time variable sources such as quasars and supernovae creates ob- 
servable time delays between the multiple images. Time delays can provide a powerful cosmographic 
probe through the "time delay distance" involving the ratio of lens, source, and lens-source distances. 
However, lightcurves of lensed images have measurement gaps, noise, systematics such as microlens- 
ing from substructure along an image line of sight, and no a priori functional model, making robust 
time delay estimation challenging. Using Gaussian process techniques, we demonstrate success in 
accurate blind reconstruction of time delays and reduction in uncertainties for real data. 



I. INTRODUCTION 

Multiple images of a single source are dramatic evi- 
dence for the effect of gravity, specifically general rel- 
ativity, on light. This strong gravitational lensing not 
only splits the images, but magnifies or demagnifies the 
source flux and induces time delays between the images. 
The time delays arise from both the geometric path dif- 
ferences along the various lines of sight and the gravita- 
tional potential differences traversed by the photons. 

When the source is variable, such as from a quasar or 
supernova, the time delays in the flux of one image rela- 
tive to another can be observed. With careful modeling 
of the lens mass distribution, and measurement of the 
angular positions of the images, the geometric factors of 
distances between observer and lens, observer and source, 
and lens and source can be extracted as a ratio called the 
time delay distance. Recent advances in lens modeling 
[1, 2] and careful, long term flux monitoring programs 
such as CosmoGrail [3] (also see [4, 5]) have matured 
strong lensing time delays to an incipient cosmographic 
probe. 

This prospect is exciting for several reasons. Since time 
delays over cosmological distances are sensitive not just 
to the overall scale, or Hubble constant, but the cos- 
mic energy density and its evolution with redshift, one 
can constrain (combinations of) the matter and dark en- 
ergy densities and dark energy equation of state. More- 
over, the time delay distance acts fundamentally differ- 
ently from luminosity and angular distances measured by 
calibrated standard candles such as Type la supernovae 
and rulers such as baryon acoustic oscillations. Hence 
it has distinct covariances among cosmological parame- 
ters and can be powerful in complementarity with the 
standard distance probes [6, 7]. Finally, despite the lens 
mass modeling, strong lensing time delays are a geomet- 
ric probe and are tied only to the late universe, unique 
except for supernovae (but with different systematics and 
covariances) among all cosmological probes. 

Here we address one important element of the use of 
lensing time delays: accurate estimation of the actual 
time delays. While great progress has been made in re- 
cent years (see, e.g., [8-10]), in large part due to heroic 



observing programs and improved data sets, this is not a 
solved problem. Mathematically, one can consider it as 
reconstructing a shift between multiple noisy, irregularly 
sampled, differentially amplified data streams. We apply 
a special combination of Gaussian process statistics to 
this task. Such a concept for strong lensing dates back 
to [11] and more recently has been shown to have reason- 
able success [8]; we introduce several new features that 
exhibit noticeable improvement in the state of art. 

Section II outlines the challenge of reconstructing the 
time delays from realistic data complete with systematics 
such as microlensing. The Gaussian process methodology 
is described in Sec. Ill, introducing the various correla- 
tion function terms and accounting for systematics. We 
test the method against blinded mock data, and real data 
from the literature, in Sec. IV, and conclude in Sec. V. 



II. TIME DELAYED LIGHTCURVES 

Fluxes received from an image at several times define 
a lightcurve, but the name is misleading since the data 
are not continuous but discrete, and the observations are 
often irregular and sparse and have measurement uncer- 
tainties. The best monitoring frequency may be every 
day or two, while long gaps of a few months occur due 
to seasonal visibility of regions of the sky from a single 
telescope. The cadence is often irregular, though ongo- 
ing wide area surveys such as Dark Energy Survey (DES 
[12]), Kilodegree Survey (KIDS [13]), and PanSTARRS 
[14], and in the future LSST [15], may have regular ob- 
servations with periods of several days. 

Apart from the sparscness, the data has photomet- 
ric measurement noise. Most current observations come 
from small (1 meter) telescopes, and atmosphere, tele- 
scope, and detector noise all contribute. With wide field 
surveys, hundreds to thousands of time delay systems 
may be found, enabling choice of the cleanest for use as 
time delay distance probes. Since to obtain a time de- 
lay distance one must have a robust model of the lens 
mass distribution, galaxy lenses are preferred over clus- 
ter lenses due to less complex modeling. Depending on 
lens mass and geometry this implies time delays in the 



range of a few to hundred days in general. 

Comparing lightcurves from different images involves 
some form of cross-correlation, looking for the time delay 
between them. Straightforward cross-correlation tech- 
niques tend not to work well due to the noisiness and 
sparseness of the data, and extrinsic contributions (see, 
e.g., [16]). Instead of comparing noisy data with noisy 
data, regression techniques attempt to reconstruct the 
underlying true source variation and compare the image 
measurements to that. We employ Gaussian processes 
(GP) as the regression technique. See [17] for an example 
of its application to (non-lensed) supernova lightcurves. 

In addition to measurement difficulties, astrophysical 
systematics contribute to the challenge of time delay es- 
timation. Further time variations arise from microlcns- 
ing caused by passage of substructure near to the line 
of sight. This affects images independently, breaking the 
(delayed) coherence between them, and can occur on all 
time scales. Short variations just add noise but long term 
variations disrupt the relation between the lightcurves for 
large portions of the data set and so can cause misesti- 
mation of the time delay. These long term variations 
are moderately smooth and some previous work has used 
low order polynomials or splines to represent them; we 
instead allow the data to determine their time scale. 

Thus we have three elements entering into the light 
curves: the intrinsic variation that we want to measure, 
the observational noise, and the astrophysical microlens- 
ing systematic (in fact our formalism would allow multi- 
ple versions of the last two). The challenge of robust time 
delay estimation is to reconstruct phase shifts of a source 
with unknown intrinsic flux variation, for images with in- 
dependent microlensing magnifications along their lines 
of sight, using noisy data with irregular temporal sam- 
pling. Figure 1 shows an example of real lightcurves from 
four images of quasar HE 0435-1223 measured by Cos- 
moGrail [18]. Conventionally observations are reported 
in magnitudes (logarithmic flux units). 
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FIG. 1. Magnitudes (log flux) of four images of the quasar 
HE 0435-1223 are plotted vs time, with an arbitrary overall 
zeropoint. 



III. TIME DELAY ESTIMATION 

For reconstructing an intrinsic function from isolated, 
noisy data points, Gaussian processes offer a robust, sub- 
stantially model independent statistical method with well 
defined error characterization. See [19] for a thorough 
discussion of GP from a statistical point of view. The 
basic idea is that the function is not parametrized, but 
rather the data are fit to a whole family of possible curves, 
given by a Gaussian distribution with a mean function 
and a covariance kernel between points. 

The key choices are the form of the mean function 
(which ideally does not affect the final fit but in practice 
a poor mean function can lead to difficulties) and the co- 
variance kernel, together with any hyperparameters used 
in those functions. There is a single GP representing 
the true source underlying all of the images plus the mi- 
crolensing of a reference image. For a mean function we 
adopt a constant function, then allow hyperparameters 
for magnifications relative to the reference image. We 
try different reference images to test for robustness. 

For the covariance kernel we adopt a Matern function 
with index 3/2, 



n^Hj Zj 
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(1) 



where U and tj are measurement times, the hyperparam- 
eter a adjusts the amplitude of the kernel and I func- 
tions as a correlation length. The Matern function is 
commonly used in statistics [19] and allows for greater 
roughness in the variation than another common choice, 
the squared exponential or Gaussian, 
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We will compare the results for these two kernels; gen- 
erally we find that they work comparably well, with the 
Matern case often having somewhat smaller statistical 
errors and bias. 

We include measurement noise and an additional 
nugget term <7„<%, which acts as a zero lag dispersion, 
i.e. the noise in a point process at a given time. This 
is distinct from the GP amplitude a in that a accounts 
for the global variations of the kernel whereas the nugget 
term a n accounts for the independent dispersion of the 
individual data points around the predicted GP value. 

The microlensing systematic has been attempted to be 
addressed in the literature by multiplying the lightcurves 
by a quadratic polynomial or a cubic spline over short 
time spans or within an observing season. This restricts 
the allowed variations and has the potential to lead to 
bias in the reconstructed time delays or simply a failed 
fit. We remain within the GP framework, which does not 
impose a specific model or timescale for the microlens- 
ing, and account for the microlensing with a GP for each 
image (other than the reference one) with zero mean 
function and a squared exponential kernel of common 



amplitudes a 1 and correlation lengths l^. To separate 
the microlensing GP from the quasar GP, we require a 
long correlation length L (systems with the microlens- 
ing timescale comparable to the intrinsic variations are 
not useful for time delay measurement). We have inves- 
tigated various choices of priors, for example ^(l^) > 50 
days, 7r(Z M ) > season, or n(l^) > 31; all give equivalent 
results. 

In summary, the lightcurve predictions for our full GP 
regression take the form 



yi~GPQ(0 Q h P ;*-*i) (3) 

V2~GP Q (0 Qhp ; t - 1 2 ) + GP„ 2 (0 Mhp ) + Am 2 (4) 
y 3 ~ GP Q (6 Qhp ; t - t 3 ) ± GP m3 (W + Am 3 (5) 

(6) 

and so forth for each image, where #Qhp is the hyper- 



parameter vector for the quasar GP, 



Vhp 



is for the mi- 



crolensing GP, and Am represents the magnification rel- 
ative to the reference image 1. 
The GP likelihood is [19] 

2\np{Y\6) = -Y T K~ 1 Y-\n\K\ - N d ln27r, (7) 

where Y is the vector of magnitude data, with Nd the 
total number of data points, 9 represents the fit param- 
eters, e.g. time delays, and K is the full kernel with \K\ 
being its determinant. The likelihood is maximized for 
the most likely values of the time delays and magnifica- 
tions, which we find using the function minimizer routine 
Minuit [20] and have validated using a Monte Carlo anal- 
ysis. 

In principle, we can combine all lightcurves at once, 
compare two at a time, or any number of lightcurves. 
Simultaneous analysis of more than two curves allows a 
consistency check in the form of the triangle equality, e.g. 
Ai^c = Ai^s + Aisc, and is our baseline approach. Us- 
ing more lightcurves also has the advantage of the lever- 
age of more images on simultaneously constraining the 
underlying source light curve. Analysis using just a pair 
has fewer hyperparameters and may deliver smaller sta- 
tistical errors, but at the risk of bias. We carry out cross- 
checks by trying different numbers of lightcurves in the 
analysis, finding that the results from the pair analyses 
can provide useful initial conditions to the simultaneous 
fit. One can also use portions of data, such as selected 
observation seasons, to cross check the consistency of the 
results or to reduce the impact of microlensing as has 
been done in the literature before. We find the results 
from our approach to be robust to the number of data 
points used in the analysis. 

In summary, when fitting N lightcurves we have the 
N — 1 time delay parameters that are our goal, the TV— 1 
magnifications Am, and the hyperparameters a 2 , c^, a 1 , 



IV. TESTS AND RESULTS 

A. Blind mock data 

To test the accuracy and robustness of the method 
we initially created blinded mock data sets. To pre- 
serve realistic sampling and data quality, one author 
took lightcurve data from one image of quasar HE 0435- 
1223, realized three new lightcurves using random Gaus- 
sian distributions with mean zero and standard deviation 
equal to the data errors, and shifted each of the result- 
ing lightcurves vertically by various magnifications and 
horizontally by time delays. The shifted data were then 
resampled onto the original time sampling using linear 
interpolation. Another author, unaware of the simulated 
time delay and magnification values, was given the final 
data points with error bars and carried out the GP fit. 

The results are shown in Table I, with the true values of 
15.0 and 25.0 day delays recovered within the 68% confi- 
dence level. Several other tests with different time delays 
had similar results. Both Matern and squared exponen- 
tial kernels work well. We find that the magnification and 
nugget terms are both important to include. Time delays 
are also tested for robustness by choosing different ref- 
erence curves and different multiplicities (i.e. fitting for 
the AB time delay in isolation, or simultaneously fitting 
the GP to more than two lightcurves). Quoted values 
reflect the central values and uncertainties from the con- 
figuration that has the best reduced x 2 and the smallest 
errors. These uncertainties are marginalized over all the 
other parameters and hyperparameters. 



Kernel 


At AB 


At A c 


Ai sc 


Matern 
SqExp 


14.3 ±0.8 
13.9 ±1.3 


25.1 ±0.9 
25.8 ±1.4 


10.8 ±0.9 
10.6 ±0.7 


Matern 2-curvc 
Sq Exp 2-curve 


14.8 ±1.0 
15.0 ±1.2 


25.1 ±1.2 

25. 7 ±1.4 


11.2 ±1.0 
12.0 ±1.4 



TABLE I. Blind analysis of time delays works for both 
Matern and squared exponential GPs, whether analyzing 
all lightcurves simultaneously (our standard approach) or in 
pairs ("2-curve"). The input to the simulation had AtAB = 
15.0 days, AtAC — 25.0 days. 



Figure 2 shows the ID and 2D joint likelihood contours 
for the time delay parameters in the mock data case. As a 
comparison, these results are obtained using CosmoMC 
[21] as a generic Monte Carlo sampler, and are wholly 
consistent with the Minuit results. For all the parameters 
and hyperparameters we impose a very wide flat prior 
and let data decide their values. The only constraint is 
on the microlensing correlation length, which as discussed 
should not be too small and hence mix with the actual 
correlation length of the GP kernel. 

A larger and more sophisticated series of data chal- 
lenges is forthcoming as part of the LSST Dark Energy 
Science Collaboration strong lensing working group. 
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FIG. 2. Marginalized ID and 2D likelihood contours are il- 
lustrated for the two time delays in the mock data case. The 
fiducial value is marked with a white plus sign. 



B. Actual data 

The second part of testing the GP method involves us- 
ing public data sets from CosmoGrail and other literature 
sources [3, 4, 22-24] as inputs for time delay estimation. 
These results can then be compared to the literature re- 
sults obtained using a variety of different methods. 

We use the two sets of CosmoGrail lightcurves pub- 
licly available at [18], for quasars HE 0435-1223 and 
WFI J2033-4723, and the radio lightcurves of quasar 
B1608+656, courtesy of Chris Fassnacht. Table II com- 
pares the results we obtain from our baseline GP analysis 
using the Matern kernel with those published in the lit- 
erature. The values are consistent with each other, with 
the GP analysis tending to have smaller errors. Note the 
true values of the time delays are not known, but the 
consistency offers an indication of robustness. 

The GP analysis not only estimates the time delays, a 
key input for cosmography through time delay distances, 
but provides information on the intrinsic quasar variabil- 
ity, the variations around the best fit GP lightcurve, and 
the microlensing systematics through the hyperparame- 
ters such as the correlation lengths, GP amplitudes, and 
nugget. 

We find that there is no significant correlation between 
the parameters. The nugget term is usually important 
and has a value comparable to the errors on the data 
points. We also find that including the microlensing term 
is useful even when there is no significant microlensing in 
the system. 



The quasar HE 0435-1223 (Fig. 1) has a long obser- 
vation period with distinct features in the intrinsic vari- 
ability, making it fairly straightforward to compute the 
time delays. The bottom curve has significant microlens- 
ing variation which leads to large microlensing amplitude 
<7 M . The microlensing correlation length (~ 700 days) 
is completely separated from the quasar GP correlation 
length (~ 100 days). There is strong agreement between 
our results, those of Literature 1 [3] that uses only the 
first two observation seasons, and those of Literature 2 
[23]. Our uncertainties are often smaller by a factor of 
two. 

The quasar WFI J2033-4723 has a relatively shorter 
observation time but distinct features in the light curves. 
There is no significant long-range microlensing and hence 
<7 M is very small indicating that including microlensing 
terms may not be necessary (but this is not known a pri- 
ori). Again, despite using several hyperparameters, our 
marginalized errors are comparable or smaller to results 
from [22]. 

The quasar B1608+656 (lightcurves shown in Fig. 3) 
is an example of a challenging system with large data 
gaps, relatively small intrinsic variability, and significant 
microlensing, all of which make it hard to estimate the 
time delays of its images. While we have successfully 
derived the time delays between all the images, includ- 
ing the cases not presented in the literature, the errors 
bars are relatively large. This is in part due to the fea- 
tureless light curves (especially the bottom curve, D in 
Table II, which is almost flat) and also due to the fact 
that our errors are marginalized over other parameters. 
For example, fitting the nugget term increases the errors 
by at least a factor of two while its presence is relatively 
unimportant for this system. 
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FIG. 3. Magnitudes (log flux) of four images of the quasar 
B1608+656 are plotted vs time, with an arbitrary overall ze- 
ropoint. 



V. CONCLUSIONS 

Accurate estimation of strong lensing time delays is an 
essential element in the use of time delay distances as 



Kernel 


At A B 


At A c 


At A D 


At B c 


At BD 


AtcD 


HE 0435-1223 GP 
HE 0435-1223 Lit(l) [3] 
HE 0435-1223 Lit (2) [23] 


-9.6 ±1.1 

-8.4 ±2.1 
-8.8 ±2.4 


-1.5 ±1.1 

-0.6 ±2.3 
-2.0 ±2.7 


-14.0 ±0.9 
-14.9 ±2.1 
-14.7 ±2.0 


8.1 ±1.1 

7.8 ±0.8 
6.8 ±2.7 


-5.0 ±1.1 
-6.5 ±0.7 
-5.9 ±1.7 


-12.3 ±1.1 
-14.3 ±0.8 
-12.7 ±2.5 


WFI J2033-4723 GP 

WFI J2033-4723 Lit [22] 


36.0 ±1.5 
35.5 ±1.4 


-26.3 ±1.7 
-27.1 +4.1/-2.3 


_ 


-62.0 ±2.3 
? 


: 


: 


B1608±656 GP 
B1608±656 Lit [24] 


31.7±2.1 
31.5 +2.0/-1.0 


-2 A ±2.2 

? 


-50.4 ±6.9 
? 


-35.0 ±4.0 
-36.0 ±1.5 


-77.5 ±7.1 
-77.0 +2.0/-1.0 


-44.4 ±5.4 
? 



TABLE II. Time delay estimations are compared between our GP analysis and values in the literature using different recon- 
struction methods. A question mark represents time delay estimates not provided by the literature, a dash indicates there is 
no fourth image. 



a novel cosmological probe. The complementarity, sub- 
stantially geometric nature, and disjoint systematics of 
this technique make its use a goal worth striving for. 

We have explored Gaussian processes as a regression 
method that is effectively model independent and we 
demonstrated robust results for both blind mock data 
and actual literature data, in many cases reducing the 
uncertainties of the time delay estimations. Noisy data, 
gaps in the observations, and extrinsic microlensing vari- 
ations can all be handled by the method. 

Robustness arises not just from the technique itself, 
but the ability to use multiple lightcurves simultaneously, 
and test results against different combinations. Several 
possibilities exist for further improvement. One could 
weight the estimations derived from different combina- 
tion of curves; one could remove unnecessary hyperpa- 
rameters to reduce estimation uncertainty while check- 
ing that the best fit does not shift; and of course one 
could obtain better data. Forthcoming surveys will find 
many more suitable lensing systems, allowing choice of 
the cleanest or best observed (with low photometric un- 
certainties, better cadence with fewer gaps, etc.). Simu- 
lated data challenges will also provide important training 



and assessment of the reconstruction method. 

While time delay estimation is just one element in the 
development of strong lensing distances as a new cosmo- 
logical probe, its improvement is key to this promising 
technique for mapping the Universe. Future work in- 
cludes applying our GP reconstruction method to studies 
of lensed supernovae or other variable sources. 
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